MANOVA_TEST

Overview

The MANOVA_TEST function performs Multivariate Analysis of Variance (MANOVA), a statistical procedure for comparing multivariate sample means across two or more groups. MANOVA extends univariate analysis of variance (ANOVA) to situations where there are multiple dependent variables, using the covariance between outcome variables when testing the statistical significance of mean differences.

This implementation uses the statsmodels library’s MANOVA class, which is based on multivariate regression. For more details, see the statsmodels MANOVA documentation. The function tests the null hypothesis that all group mean vectors are equal across the specified dependent variables.

The function returns four commonly used test statistics, each derived from the eigenvalues \lambda_p of the matrix A = S_{\text{model}} S_{\text{res}}^{-1}:

  • Wilks’ lambda: \Lambda_{\text{Wilks}} = \prod (1 + \lambda_p)^{-1} — measures the proportion of variance not explained by group differences
  • Pillai’s trace: \Lambda_{\text{Pillai}} = \sum \frac{\lambda_p}{1 + \lambda_p} — considered the most robust to violations of assumptions
  • Hotelling-Lawley trace: \Lambda_{\text{LH}} = \sum \lambda_p — powerful when group differences are concentrated in one dimension
  • Roy’s greatest root: \Lambda_{\text{Roy}} = \max(\lambda_p) — most powerful when the alternative hypothesis is true for a single linear combination

Each test statistic is converted to an approximate F-statistic with associated degrees of freedom and p-value. The function compares the minimum p-value across all test statistics against the specified significance level (alpha) to determine whether to reject the null hypothesis.

MANOVA is particularly useful in experimental designs where multiple related outcomes are measured simultaneously, as it controls the family-wise error rate better than running separate ANOVAs. For background on multivariate analysis of variance, see the Wikipedia article on MANOVA.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=MANOVA_TEST(data, groups, alpha)
  • data (list[list], required): A matrix of dependent variables where rows are observations and columns are dependent variables.
  • groups (list[list], required): A column vector of group membership indicators (integer coded). Must have the same number of rows as data.
  • alpha (float, optional, default: 0.05): Significance level for hypothesis testing. Must be between 0 and 1 (exclusive).

Returns (list[list]): 2D list with MANOVA results, or error message string.

Examples

Example 1: Two groups with two dependent variables

Inputs:

data groups
1 2 1
2 3 1
3 4 1
4 5 2
5 6 2
6 7 2

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;4,5;5,6;6,7}, {1;1;1;2;2;2})

Expected output:

"non-error"

Example 2: Three groups with two dependent variables

Inputs:

data groups
1 2 1
2 3 1
3 4 1
5 6 2
6 7 2
7 8 2
9 10 3
10 11 3
11 12 3

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;5,6;6,7;7,8;9,10;10,11;11,12}, {1;1;1;2;2;2;3;3;3})

Expected output:

"non-error"

Example 3: Custom alpha value with stricter significance level

Inputs:

data groups alpha
1 2 1 0.01
2 3 1
3 4 1
4 5 2
5 6 2
6 7 2

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;4,5;5,6;6,7}, {1;1;1;2;2;2}, 0.01)

Expected output:

"non-error"

Example 4: Two groups with three dependent variables

Inputs:

data groups
1 2 3 1
2 3 4 1
3 4 5 1
5 6 7 2
6 7 8 2
7 8 9 2

Excel formula:

=MANOVA_TEST({1,2,3;2,3,4;3,4,5;5,6,7;6,7,8;7,8,9}, {1;1;1;2;2;2})

Expected output:

"non-error"

Python Code

import pandas as pd
from statsmodels.multivariate.manova import MANOVA as statsmodels_manova

def manova_test(data, groups, alpha=0.05):
    """
    Performs Multivariate Analysis of Variance (MANOVA) for multiple dependent variables.

    See: https://www.statsmodels.org/stable/generated/statsmodels.multivariate.manova.MANOVA.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): A matrix of dependent variables where rows are observations and columns are dependent variables.
        groups (list[list]): A column vector of group membership indicators (integer coded). Must have the same number of rows as data.
        alpha (float, optional): Significance level for hypothesis testing. Must be between 0 and 1 (exclusive). Default is 0.05.

    Returns:
        list[list]: 2D list with MANOVA results, or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    def validate_float(val, name):
        if not isinstance(val, (int, float)):
            return f"Invalid input: {name} must be a number."
        val = float(val)
        if val != val or val == float('inf') or val == float('-inf'):
            return f"Invalid input: {name} must be finite."
        return val

    # Normalize inputs
    data = to2d(data)
    groups = to2d(groups)

    # Validate alpha
    alpha_val = validate_float(alpha, "alpha")
    if isinstance(alpha_val, str):
        return alpha_val
    if alpha_val <= 0 or alpha_val >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    # Validate data is a 2D list
    if not isinstance(data, list) or len(data) == 0:
        return "Invalid input: data must be a non-empty 2D list."

    for i, row in enumerate(data):
        if not isinstance(row, list):
            return f"Invalid input: data row {i} must be a list."
        if len(row) == 0:
            return f"Invalid input: data row {i} must be non-empty."

    # Get dimensions
    n_obs = len(data)
    n_vars = len(data[0])

    # Validate all rows have same length
    for i, row in enumerate(data):
        if len(row) != n_vars:
            return f"Invalid input: all rows in data must have the same length."

    # Validate all elements in data are numeric
    data_flat = []
    for i, row in enumerate(data):
        row_vals = []
        for j, val in enumerate(row):
            validated = validate_float(val, f"data[{i}][{j}]")
            if isinstance(validated, str):
                return validated
            row_vals.append(validated)
        data_flat.append(row_vals)

    # Validate groups is a column vector
    if len(groups) != n_obs:
        return f"Invalid input: groups must have {n_obs} rows to match data."

    for i, row in enumerate(groups):
        if not isinstance(row, list):
            return f"Invalid input: groups row {i} must be a list."
        if len(row) != 1:
            return f"Invalid input: groups must be a column vector (each row has 1 element)."

    # Extract and validate group values
    group_vals = []
    for i, row in enumerate(groups):
        validated = validate_float(row[0], f"groups[{i}][0]")
        if isinstance(validated, str):
            return validated
        group_vals.append(int(validated))

    # Check we have at least 2 groups
    unique_groups = list(set(group_vals))
    if len(unique_groups) < 2:
        return "Invalid input: groups must contain at least 2 distinct values."

    # Check we have at least 1 dependent variable
    if n_vars < 1:
        return "Invalid input: data must have at least 1 dependent variable."

    # Create DataFrame
    df_data = {}
    for j in range(n_vars):
        df_data[f"DV{j+1}"] = [data_flat[i][j] for i in range(n_obs)]
    df_data["Group"] = group_vals
    df = pd.DataFrame(df_data)

    # Create formula
    dv_names = [f"DV{j+1}" for j in range(n_vars)]
    formula = " + ".join(dv_names) + " ~ Group"

    # Fit MANOVA
    try:
        manova = statsmodels_manova.from_formula(formula, data=df)
        results = manova.mv_test()
    except Exception as exc:
        return f"statsmodels.multivariate.manova.MANOVA error: {exc}"

    # Extract test results
    # results.results contains the test statistics
    # We need to extract Wilks' lambda, Pillai's trace, Hotelling-Lawley trace, and Roy's greatest root
    try:
        test_results = results.results["Group"]["stat"]
    except Exception as exc:
        return f"statsmodels.multivariate.manova.MANOVA error: unable to extract results: {exc}"

    # Build output
    output = []

    # Header row
    output.append(["test_statistic", "statistic_name", "statistic_value", "f_value", "df_num", "df_denom", "p_value"])

    # Test statistics to extract
    test_stats = [
        ("Wilks' lambda", "Wilks"),
        ("Pillai's trace", "Pillai"),
        ("Hotelling-Lawley trace", "Hotelling-Lawley"),
        ("Roy's greatest root", "Roy")
    ]

    min_p_value = 1.0

    for stat_name, stat_key in test_stats:
        try:
            if stat_name in test_results.index:
                row_data = test_results.loc[stat_name]
                stat_value = float(row_data["Value"])
                f_value = float(row_data["F Value"])
                df_num = float(row_data["Num DF"])
                df_denom = float(row_data["Den DF"])
                p_value = float(row_data["Pr > F"])

                # Track minimum p-value for conclusion
                if p_value < min_p_value:
                    min_p_value = p_value

                output.append([stat_key, stat_name, stat_value, f_value, df_num, df_denom, p_value])
        except Exception as exc:
            return f"statsmodels.multivariate.manova.MANOVA error: unable to extract {stat_name}: {exc}"

    # Add conclusion row
    if min_p_value < alpha_val:
        conclusion = "reject_null"
    else:
        conclusion = "fail_to_reject_null"

    output.append([conclusion, "", "", "", "", "", ""])

    return output

Online Calculator